[Data] Fix: Replace bare raise with TypeError in string concatenation#60795
[Data] Fix: Replace bare raise with TypeError in string concatenation#60795slfan1989 wants to merge 3 commits intoray-project:masterfrom
Conversation
Replace the bare `raise` statement in `_to_pa_string_input()` with a proper TypeError that includes a descriptive error message. This ensures string concatenation operations fail with clear feedback when given non-string inputs (e.g., numeric columns). Changes: - expression_evaluator.py: Add TypeError with descriptive message - test_arithmetic.py: Add test for invalid input type rejection Signed-off-by: slfan1989 <slfan1989@apache.org>
There was a problem hiding this comment.
Code Review
This pull request is a solid improvement, replacing a bare raise with a descriptive TypeError for invalid string concatenation operations. The code is also simplified by using early returns. I've added one suggestion to make the error message even more informative by including the specific data type of the invalid input, which aligns with the goal stated in the pull request description. The new unit test is a great addition to prevent regressions.
| raise TypeError( | ||
| "Expected string or string-like pyarrow Array/ChunkedArray for string " | ||
| f"concatenation, got {type(x).__name__}." | ||
| ) |
There was a problem hiding this comment.
The error message can be made more specific to better align with the goal stated in the PR description. When x is a pyarrow.Array or pyarrow.ChunkedArray, using x.type instead of type(x).__name__ will provide the underlying data type (e.g., int64), which is more informative for debugging.
| raise TypeError( | |
| "Expected string or string-like pyarrow Array/ChunkedArray for string " | |
| f"concatenation, got {type(x).__name__}." | |
| ) | |
| type_name = x.type if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__ | |
| raise TypeError( | |
| "Expected string or string-like pyarrow Array/ChunkedArray for string " | |
| f"concatenation, got {type_name}." | |
| ) |
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Show resolved
Hide resolved
Replace the bare `raise` statement in `_to_pa_string_input()` with a proper TypeError that includes a descriptive error message. This ensures string concatenation operations fail with clear feedback when given non-string inputs (e.g., numeric columns). Changes: - expression_evaluator.py: Add TypeError with descriptive message - test_arithmetic.py: Add test for invalid input type rejection Signed-off-by: slfan1989 <slfan1989@apache.org>
| if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x): | ||
| return _pa_decode_dict_string_array(x) | ||
| if isinstance(x, (pa.Array, pa.ChunkedArray)): | ||
| actual_type = str(x.type) | ||
| else: | ||
| raise | ||
| return x | ||
| actual_type = type(x).__name__ | ||
| raise TypeError( | ||
| "Expected string or string-like pyarrow Array/ChunkedArray for string " | ||
| f"concatenation, got {actual_type}." | ||
| ) |
There was a problem hiding this comment.
Suggested cleanup:
if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x):
return _pa_decode_dict_string_array(x)
actual_type = str(x.type) if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {actual_type}."
)
There was a problem hiding this comment.
Thanks for the suggestions! I’ve updated the PR accordingly—could you please take another look?
Replace the bare `raise` statement in `_to_pa_string_input()` with a proper TypeError that includes a descriptive error message. This ensures string concatenation operations fail with clear feedback when given non-string inputs (e.g., numeric columns). Changes: - expression_evaluator.py: Add TypeError with descriptive message - test_arithmetic.py: Add test for invalid input type rejection Signed-off-by: slfan1989 <slfan1989@apache.org>
Description
This PR fixes a bug in
_to_pa_string_input()where attempting to concatenate string columns with non-string columns (e.g., numeric types) would raise a bareRuntimeErrorinstead of a descriptiveTypeError.Changes:
raisestatement with properTypeErrorthat includes a clear error message indicating expected vs actual input typestest_string_concat_invalid_input_typeto verify the fixBefore: Bare
raisecaused crypticRuntimeError: No active exception to reraiseAfter: Clear
TypeError: Expected string or string-like pyarrow Array/ChunkedArray for string concatenation, got int64.Related issues
Additional information
Example of the fixed behavior: